# End-to-end speech model
Voila Chat
MIT
Voila is a brand-new large-scale speech-language foundation model series designed to elevate human-computer interaction to unprecedented levels.
Text-to-Audio
Transformers Supports Multiple Languages

V
maitrix-org
2,423
32
Llama3.1 Typhoon2 Audio 8b Instruct
Typhoon 2-Audio Edition is an end-to-end speech-to-speech model architecture capable of processing audio, speech, and text inputs while simultaneously generating both text and speech outputs. The model is specifically optimized for Thai language while also supporting English.
Text-to-Audio
Transformers Supports Multiple Languages

L
scb10x
664
9
Flow Mirror
Apache-2.0
FlowMirror is an end-to-end speech model developed by Zhejiang Jingzhunxue AI Lab, supporting tasks such as voice dialogue, ASR, and TTS, with a focus on educational applications
Text-to-Audio
Transformers

F
jzx-ai-lab
21
2
Mms Tts Vie
Vietnamese text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis
Speech Synthesis
Transformers

M
facebook
3,616
27
W2v Timit Ft 4001
A speech recognition model based on Wav2Vec 2.0 architecture, fine-tuned on the TIMIT dataset, suitable for English speech-to-text tasks
Speech Recognition
Transformers

W
devin132
22
0
Featured Recommended AI Models